This script shows how to estimate spillover from single metal spots on an agarose coated slide. Each spot should be imaged with a single acquisition. The name of the acquisition should be the metal that is used: E.g. PanormaA_1_Yb176_23.txt
When run with the example data it reproduces the spillover estimation shown in Fig S5A as well as Fig 4A
library(CATALYST)
package ‘CATALYST’ was built under R version 3.4.3
library(data.table)
library(ggplot2)
library(flowCore)
library(dplyr)
library(dtplyr)
library(stringr)
source('spillover_imc_helpers.R')
# list of folders that contain each a complete single stain acquisition (e.g. in case that one wants to run and compare multiple single stains from different days)
fols_ss = c('../data/Figure_S5/Spillover_Matrix_2','../data/Figure_S5/Spillover_Matrix_1' )
# output folder
fol_out = '../data/Figure_S5/'
# name prefix for all output
prefix = paste0(Sys.Date(), '_v1_')
# load the data
list_img_ss <-lapply(fols_ss, load_ss_fol)
names(list_img_ss) <- fols_ss
CATALYST needs to have the metal names in the format (METAL)(MASS)Di
list_img_ss = lapply(list_img_ss, function(x) lapply(x, fixnames))
dats_raw = lapply(list_img_ss, imglist2dat)
This needs to be changed in case a different naming scheme is used!
for (dat in dats_raw){
dat[, metal:= strsplit(.BY[[1]], '_')[[1]][3],by=file]
dat[, mass:= as.numeric(str_extract_all(.BY[[1]], "[0-9]+")[[1]]),by=metal]
}
In the following section the raw data is visualized
dats_raw_sum = rbindlist(lapply(dats_raw, calc_file_medians),idcol = T)
Plots the median of the data. It is recommended to have >200 counts for all the channels. This is also a good plot to check if the metal spots really contain the correct metal!
dats_raw_sum %>%
ggplot(aes(x=mass, y=med, color=.id))+
facet_wrap(~file+metal, scales = 'free_y')+
geom_label(aes(label=variable), size=4)
If the median per-pixel intensities are to low, it could be worth to sum up some consecuteive pixels to get a better accuracy for the estimation (here not the case). This is valid because for segmentation based quantitative image analysis usually anyways pixels are aggregated. If the binning is choosen to big, there is however a potential accumulation of background noise.
# defines over how many pixels the aggregation should happen
# 1 = no aggregation
npixelbin = 1
dats_agg <- lapply(dats_raw, function(x) aggregate_pixels(x, n=npixelbin))
dats_agg_sum = rbindlist(lapply(dats_agg, calc_file_medians), idcol = T)
The intensities increase according to the aggregation factor
dats_agg_sum %>%
ggplot(aes(x=mass, y=med, color=.id))+
facet_wrap(~file+metal, scales = 'free_y')+
geom_label(aes(label=variable))
To estimate the spillover, the (aggregated) pixel values are first debarcoded using CATALYST, treating them like single cells. This step acts as a quality filter to remove background/noisy/weak pixels as well as pixels with artefacts (e.g. specles with strong signal in many channels). If the true metal was correctly encoded in the filename, the ‘remove_incorrect_bc’ option will check the debarcoding and remove events assigned to the wrong barcode.
Then this identified, strong single stain pixels will be used for the spillover estimation.
res = lapply(dats_agg, function(x) re_from_dat(x,
ss_ms=x[!is.na(mass), unique(mass)],
minevents = 40,
correct_bc = x[ , unique(mass)]))
Debarcoding data...
o ordering
o classifying events
Normalizing...
Computing deltas...
Computing counts and yields...
NaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedDebarcoding data...
o ordering
o classifying events
Normalizing...
Computing deltas...
Computing counts and yields...
NaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs produced
sms = lapply(res, function(x) computeSpillmat(x))
for (i in seq_along(sms)){
outname = file.path(fol_out, paste0(prefix, basename(fols_ss[i]),'_sm.csv'))
write.csv(sms[[i]],file = outname)
}
for (i in seq_along(sms)){
print(names(dats_agg)[i])
ss_ms = dats_agg[[i]][!is.na(mass), unique(mass)]
p = CATALYST::plotSpillmat(ss_ms,sms[[i]])
print(p)
}
[1] "../data/Figure_S5/Spillover_Matrix_2"
We recommend that you use the dev version of ggplot2 with `ggplotly()`
Install it with: `devtools::install_github('hadley/ggplot2')`
[1] "../data/Figure_S5/Spillover_Matrix_1"
We recommend that you use the dev version of ggplot2 with `ggplotly()`
Install it with: `devtools::install_github('hadley/ggplot2')`
Here we calculate e.g. number of debarcoded events/metal, median levels of highest signal and second highest signal
for (i in seq_along(res)){
dat = dats_agg[[i]]
re = res[[i]]
name = names(dats_agg)[i]
tdat = dat %>%
mutate(bcid = bc_ids(re)) %>%
filter(bcid != '0') %>%
dplyr::select(-c(Start_push, End_push, Pushes_duration, X , Y ,Z)) %>%
melt.data.table(id.vars = c('metal', 'mass','file', 'bcid')) %>%
do(data.table(.)[, list(med=median(value), n=.N), by=.(variable, metal, mass, bcid,file)])
# find the highest metal, second highest metal
sumdat = tdat[ , .(
highestvariable = variable[med == max(med)],
highestmed = max(med),
secondhighestvariable = variable[med == sort(med,partial=length(med)-1)[length(med)-1]],
secondhighestmed = sort(med,partial=length(med)-1)[length(med)-1],
n=max(n)
) ,by=.( mass, bcid,file)]
print(sumdat)
}
res_uncor = lapply(dats_agg, function(x) re_from_dat(x,
ss_ms=x[!is.na(mass), unique(mass)],
minevents = 40,
correct_bc = NULL ))
Debarcoding data...
o ordering
o classifying events
Normalizing...
Computing deltas...
Computing counts and yields...
NaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedDebarcoding data...
o ordering
o classifying events
Normalizing...
Computing deltas...
Computing counts and yields...
NaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs producedNaNs produced
sms_uncor = lapply(res_uncor, function(x) computeSpillmat(x))
Assure that the results are exactly the same when ensuring that now debarcoding error happened by using the annotation from the file names:
ndig = 8
for (i in seq_along(sms)){
print('all equal?')
print(all(round(sms_uncor[[i]], digits=ndig) == round(sms[[i]],digits = ndig)))
#diffmat = abs(round(sms_uncor[[i]], digits=ndig)-round(sms[[i]],digits = ndig))/round(sms[[i]],digits = ndig)
#match(T,diffmat > 0.01)
}
[1] "all equal?"
[1] TRUE
[1] "all equal?"
[1] TRUE
-> The results are exactly equal. Thus just debarcoding - without using any information about where the pixels actually belong - seems to be a vaild option to estimate the spillover.
plot_binplot <- function(imgs, fn, x_var, y_var, perc=0.99, nbins=100, fkt=median){
# This function makes a 'quantile' binning, binning the data in nbins that contain an equal number of events.
dat = copy(imgs[[fn]])
print(dat)
#dat[, bins:= cut(get(x_var), seq(0, quantile(get(x_var),perc),length.out = nbins), right=T, include.lowest = T)]
dat = subset(dat, get(x_var) < quantile(get(x_var),perc))
dat[, bins:= ntile(get(x_var), nbins)]
x = melt.data.table(dat, id.vars = 'bins')
x = x[, .(binmean=fkt(value)), by=.(bins, variable)]
x = dcast.data.table(x[!is.na(bins),], 'bins~variable', value.var='binmean')
ggplot(x, aes(x=get(x_var), y=get(y_var))) +
geom_smooth(method = 'lm', alpha =0.5)+
geom_point()+
xlab(x_var) +
ylab(y_var)+
expand_limits(x=0, y=0)+
stat_poly_eq(formula=as.formula('y~x'), aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
parse = TRUE) +
theme(aspect.ratio=1)
}
Plot the spillover relationships
fn = "Er166_27_Er166_28.txt"
x_var = "Er166Di"
y_var= "Er167Di"
p = plot_binplot(pltimgs, fn, x_var, y_var, perc=0.95, nbins=20, fkt = median)
'measure.vars' [Start_push, End_push, Pushes_duration, X, ...] are not all of the same type. By order of hierarchy, the molten data value column will be of type 'double'. All measure variables not of type 'double' will be coerced to. Check DETAILS in ?melt.data.table for more on coercion.
p= p+ggtitle('Fig4 A upper')
print(p)
fn ="Er166_27_Er166_28.txt"
x_var = "Er166Di"
y_var= "Er168Di"
pltimgs=list_img_ss[[1]]
p = plot_binplot(pltimgs, fn, x_var, y_var, perc=0.9, nbins=25, fkt = median)
'measure.vars' [Start_push, End_push, Pushes_duration, X, ...] are not all of the same type. By order of hierarchy, the molten data value column will be of type 'double'. All measure variables not of type 'double' will be coerced to. Check DETAILS in ?melt.data.table for more on coercion.
p= p+ggtitle('Fig4 A lower')
print(p)
-> This reproduces Fig 4A